Skip to content

perf: cache per-render arrays/LINQ in DataGrid#669

Closed
azchohfi wants to merge 4 commits into
mainfrom
azchohfi-datagrid-perf-cache
Closed

perf: cache per-render arrays/LINQ in DataGrid#669
azchohfi wants to merge 4 commits into
mainfrom
azchohfi-datagrid-perf-cache

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

Closes #663

Problem

The DataGrid (src/Reactor/Controls/DataGrid/) — which is the grid measured by the StressPerf.ReactorGrid data-grid stress workload — rebuilt arrays and re-ran LINQ on every render, even when the underlying sort/filter/column/width state was unchanged. These per-render allocations were first-order contributors to the C# vs Rust gap (Renders/sec, Avg Reconcile, Avg Diff, Avg Memory).

Changes

All edits are confined to src/Reactor/Controls/DataGrid/. New perf-only state members are internal (covered by InternalsVisibleTo Reactor.Tests), so the public API surface is unchanged.

DataGridState.cs

DataGridComponent.cs

Intentionally not done

Behavior preservation

Sorting, filtering, pagination, column show/hide/pin/resize, and range selection are unchanged. Every mutation path bumps the matching version so caches invalidate correctly.

Validation

  • dotnet build src/Reactor/Reactor.csproj -c Release0 warnings, 0 errors (no new AOT/trim warnings).
  • dotnet test tests/Reactor.Tests9681 passed, 0 failed, including 11 new DataGridPerfCacheTests (cache invalidation on sort/filter/column/width change; O(1) lookups match the old LINQ; shared layout cache reference stability; visible-columns cache; SelectRangeByKeyCache parity with the explicit-visibleOrder path).
  • DataGrid selftests (--self-test --filter DataGrid, real WinUI window) — 0 failures (paging, scroll, edit lifecycle, edit mutation, search+sort).

Note: dotnet build Reactor.slnx -c Release fails only on three unrelated tests/perf_bench/PerfBench.* XAML projects (XamlCompiler.exe exit 1 on ARM64) — confirmed pre-existing by reproducing the identical failure with these changes stashed.

The DataGrid rebuilt arrays and re-ran LINQ on every render even when the
underlying sort/filter/column/width state was unchanged. On the data-grid
stress workload (StressPerf.ReactorGrid, the grid being benchmarked) this
dominated Renders/sec, reconcile, diff, and memory.

DataGridState.cs:
- Add SortVersion/FilterVersion/ColumnVersion counters bumped by every
  mutation path (ToggleSort, SetFilter/ClearFilter/ClearAllFilters,
  Resize/Hide/Show/Reorder/Pin), used as cache keys.
- O(1) lookups via dictionaries kept in sync with the lists:
  GetSortDirection (_sortDirByField), GetFilter (_filterByField),
  GetColumnWidth/PinColumn (_columnIndexByName), replacing per-column
  LINQ FirstOrDefault/FindIndex.
- Columns getter serves a cached _visibleColumns list (version-keyed)
  instead of Where+ToList on every access.
- GetColumnLayout(...) caches the per-column widths + a single shared
  GridDefinition keyed on ColumnVersion + shape; the header and every data
  row reuse the one definition reference so the reconciler skips re-applying
  ColumnDefinitions.
- SelectRangeByKeyCache scans the internal row-key string[] by index instead
  of materializing the whole cache into a List<RowKey> on shift-click.

DataGridComponent.cs:
- Memoize the DataRequest (also stabilizes UseDataSource deps, fixing a
  spurious per-render pagination restart) and the sort key (UseMemo).
- Root grid + data/header rows build GridElements directly from cached
  GridDefinitions (static root-row-def cache; shared column layout) instead
  of allocating string[]/double[] + per-column double.ToString each render.
- Cache the once-wired LostFocus setter array in a UseRef.

The new perf-only state members are internal (InternalsVisibleTo Reactor.Tests),
so the public API surface is unchanged.

#126 (per-row Element?[] pooling) and #131B (validation-summary cache) are
intentionally not done; see the issue for the correctness rationale.

Adds DataGridPerfCacheTests covering cache invalidation, O(1)-lookup parity
with the old LINQ, the shared layout cache's reference stability, the
visible-columns cache, and SelectRangeByKeyCache parity.

Closes #663

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi azchohfi marked this pull request as draft June 25, 2026 20:07
Comment thread src/Reactor/Controls/DataGrid/DataGridComponent.cs
Review-phase fixes for PR #669 (pr-review skill + multi-model cross-check):

- GetColumnLayout: key the width/GridDefinition cache on the caller-supplied
  columns list (reference + count), not only ColumnVersion+shape. ColumnVersion
  only tracks internal mutations (resize/hide/show/reorder/pin); a swapped
  el.Columns/auto-columns list could otherwise be served a stale, wrong-sized
  layout (and index out of range in the row renderer). [HIGH, confirmed]

- DataGridComponent: hoist both editable-grid UseRef hooks out of the
  if (el.Editable) branch so the hook call order is stable when Editable toggles
  between renders (avoids HookOrderException). [HIGH]

- Columns getter: build a fresh visible-column snapshot on invalidation instead
  of returning the internal list or clearing the cached buffer in place, so a
  previously returned IReadOnlyList is never mutated by a later column change
  (restores the snapshot semantics of the old Where(...).ToList()). [LOW]

Tests: add coverage for layout-cache invalidation on a columns-reference/count
change, content invalidation on reorder/pin/toggle-visibility, the Columns
snapshot guarantee, unknown-column lookup parity, and SelectRange reversed/
missing/empty parity. Full Reactor.Tests: 9688 passed / 0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets DataGrid render-path allocation hotspots in src/Reactor/Controls/DataGrid/ by introducing version-keyed caches and O(1) lookups in DataGridState, and by memoizing / reusing stable layout objects in DataGridComponent. The intent is to reduce per-render LINQ/array churn without changing the public API surface.

Changes:

  • Add sort/filter/column version counters and dictionary-based O(1) lookups in DataGridState, plus cached visible-columns and cached column-layout (GridDefinition + widths) keyed by versions.
  • Update DataGridComponent to memoize DataRequest and the legacy sortKey, and to build GridElements directly from cached GridDefinitions (root grid + header/rows) to avoid per-render string/double arrays.
  • Add DataGridPerfCacheTests to validate cache invalidation, lookup parity with prior LINQ behavior, and range-selection parity.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/Reactor.Tests/DataGridPerfCacheTests.cs Adds focused unit tests validating the new caching/versioning behavior and ensuring output parity with previous LINQ-based logic.
src/Reactor/Controls/DataGrid/DataGridState.cs Introduces version counters, lookup dictionaries, visible-columns caching, column-layout caching, and an allocation-free range selection path over the row-key cache.
src/Reactor/Controls/DataGrid/DataGridComponent.cs Memoizes request/sort key and reuses cached GridDefinitions by constructing GridElement directly, reducing per-render allocations and helping reconciler skip definition reapplication.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Cache the search TextBox's onChanged handler in a UseRef so it keeps a stable
reference across renders instead of allocating a fresh closure each time. It
captures only the stable `state`, so a once-built handler is equivalent to
rebuilding it every render. The ref is declared unconditionally (hooks must run
in a stable order; ShowSearch can toggle), mirroring the #131A LostFocus-setter
pattern.

This is the only cell/toolbar-facing handler created in the component's hook
context. Every other DataGrid handler (per-row click/expand/edit/select, header
sort/resize/reorder/pin) is built inside the static, virtualized RenderRow /
RenderHeaderRow methods, which run in VirtualList's renderItem factory with no
hook context — UseCallback/UseRef cannot be called there, so they can't be
stabilized without a separate state-backed delegate cache. On this branch the
Element skip path (Element.CanSkipUpdate) compares callback *presence*, not
delegate identity (ShallowEquals intentionally ignores delegate equality), so
those fresh closures do not currently defeat the skip path; reference-stabilizing
them only matters once the Core reference-comparison skip-path lands and is best
done as a coordinated follow-up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs main baseline

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 2.82 2.77 -1.1% 95% CI [-6.1, +3.8] ≈ within noise
Avg Reconcile (ms) ↓ 127.8 129.1 +1.3% 95% CI [-2.2, +4.8] ≈ within noise
Avg Diff (ms) ↓ 116.4 118.2 +1.2% 95% CI [-2.6, +5.0] ≈ within noise
Avg Memory (MB) ↓ 287.3 288.3 -0.1% 95% CI [-0.8, +0.6] ≈ within noise

Low-mutation skip-floor (--percent 0)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 16.62 16.58 +0.9% 95% CI [-6.6, +8.5] ≈ within noise
Avg Reconcile (ms) ↓ 34.1 36.1 +4.7% 95% CI [-4.9, +14.4] ≈ within noise
Avg Diff (ms) ↓ 32.2 34.2 +4.4% 95% CI [-5.9, +14.8] ≈ within noise
Avg Memory (MB) ↓ 268.2 267.5 -0.1% 95% CI [-0.4, +0.2] ≈ within noise

Allocation (Reactor) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 5699901 5749942 +0.2% 95% CI [-1.5, +1.9] ≈ within noise
Gen0 GC / 1k renders ↓ 207.14 222.53 +1.8% 95% CI [-10.3, +13.9] ≈ within noise

Keyed-list workload (StressPerf.KeyedList, --percent 50)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyedReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 22.37 22.38 -0.5% 95% CI [-2.6, +1.6] ≈ within noise
Avg Reconcile (ms) ↓ 14.9 14.8 +0.5% 95% CI [-1.4, +2.4] ≈ within noise
Avg Diff (ms) ↓ 14.7 14.6 +0.5% 95% CI [-1.4, +2.3] ≈ within noise
Avg Memory (MB) ↓ 169.7 170.1 -0.2% 95% CI [-0.9, +0.4] ≈ within noise

Allocation (keyed-list) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 313624 313312 0.0% 95% CI [-0.4, +0.3] ≈ within noise
Gen0 GC / 1k renders ↓ 17.51 17.51 +0.5% 95% CI [-1.6, +2.6] ≈ within noise

Reconciler micro-benchmarks (PerfBench.ControlModel)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench main ns/op Δ ns (95% CI) main B/op Δ alloc (95% CI) Status
M1 Mount_Leaf_NoCallback 149739.1 -0.9% 95% CI [-3.7, +1.9] 1140.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M2 Mount_Leaf_OneCallback 105856.2 +3.2% 95% CI [-2.6, +9.1] 3383.3 0.0% 95% CI [0.0, 0.0] ≈ within noise
M3 Mount_Leaf_ThreeCallbacks 214066.0 +1.6% 95% CI [-1.5, +4.8] 8484.1 -0.5% 95% CI [-2.6, +1.7] ≈ within noise
M4 Dispatch_Switch_Cold 101809.0 +2.5% 95% CI [-5.0, +10.0] 1767.8 0.0% 95% CI [0.0, 0.0] ≈ within noise
M5 Dispatch_Switch_Warm 105797.8 -0.6% 95% CI [-3.9, +2.7] 1766.0 +0.7% 95% CI [-0.4, +1.9] ≈ within noise
M6 Dispatch_ExternalType 91348.4 +1.5% 95% CI [-2.2, +5.1] 987.6 +1.4% 95% CI [-0.7, +3.4] ≈ within noise
M7 Update_NoChange 55084.0 -0.1% 95% CI [-0.6, +0.5] 452.1 -4.5% 95% CI [-9.7, +0.7] ≈ within noise
M8 Update_OneLeafChanged 41746.5 +0.2% 95% CI [-0.7, +1.1] 536.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M9 Update_AllChanged 2820576.6 +0.6% 95% CI [-0.4, +1.7] 184278.1 0.0% 95% CI [0.0, 0.0] ≈ within noise
M10 EventHandlerState_Alloc 82651.0 +2.6% 95% CI [-5.9, +11.2] 3095.2 0.0% 95% CI [-0.1, 0.0] ≈ within noise
M11 ModifierEHS_Frequency 45894.4 +0.3% 95% CI [-1.1, +1.7] 638.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M12 Pool_Rent_HotPath 117007.6 +0.3% 95% CI [-0.7, +1.2] 1099.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M13 Setters_Suppression_Scope 107.5 -0.5% 95% CI [-10.5, +9.6] 26.7 0.0% 95% CI [0.0, 0.0] ≈ within noise
M14 Dsl_Rebuild_Cascade 1505326.0 +0.9% 95% CI [-0.2, +1.9] 2231828.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
C207 ChangeHandler_DpRead_Coalesce 1217.8 -3.7% 95% CI [-6.8, -0.6] 0.6 0.0% 95% CI [0.0, 0.0] ≈ within noise
OAlloc Optional_Element_Alloc 220.0 -0.8% 95% CI [-8.0, +6.3] 528.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
OUpdate Optional_Reconciler_Update 12050.3 -0.5% 95% CI [-2.4, +1.4] 2772.3 0.0% 95% CI [0.0, 0.0] ≈ within noise

Cross-framework reference (same StocksGrid workload)

Metric vanilla WinUI3¹ Rust windows-reactor² Reactor (this PR)
Renders/sec ↑ 3.39 4.80 2.77
Avg Reconcile (ms) ↓ n/a 19.1 129.1
Avg Diff (ms) ↓ n/a 17.1 118.2
Avg Memory (MB) ↓ 263.5 197.3 288.3

↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1043006548.
Generated by .github/workflows/perf-compare.yml · PR 25432e0 vs main e8572a0 · 2026-06-27T07:29:09Z · run log.

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

@azchohfi

Copy link
Copy Markdown
Collaborator Author

Closing as not-currently-measurable: the DataGrid control path is never entered by any /perf workload (StocksGrid uses a native Grid), and re-validation on current main read textbook-flat. This is an instrument gap, not a bad change — tracked as follow-up #732 (add a DataGrid stress workload, then re-measure and revive). The implementation remains on this branch for revival.

@azchohfi azchohfi closed this Jun 27, 2026
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

1 similar comment
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: cache per-render arrays/LINQ in DataGrid

2 participants